
Extracting this dataset is computationally intensive and results in large intermediate files. Datasets were prepared first by turning XML data dumps of revisions into TSVs and then indexing them. Views were correcting for redirects. Then I created samples, and prepared them as .RData files. To repeat the analysis, run weightedStandalone.R.


